Identification and validation of CCL2 as a potential biomarker relevant to mast cell infiltration in the testicular immune microenvironment of spermatogenic dysfunction

Background Spermatogenic dysfunction is an important cause of azoospermia. Numerous studies have focused on germ-cell-related genes that lead to spermatogenic impairment. However, based on the immune-privileged characteristics of the testis, the relationship of immune genes, immune cells or immune microenvironment with spermatogenic dysfunction has rarely been reported. Results Using integrated methods including single-cell RNA-seq, microarray data, clinical data analyses and histological/pathological staining, we found that testicular mast cell infiltration levels were significantly negatively related to spermatogenic function. We next identified a functional testicular immune biomarker, CCL2, and externally validated that testicular CCL2 was significantly upregulated in spermatogenic dysfunctional testes and was negatively correlated with Johnsen scores (JS) and testicular volumes. We also demonstrated that CCL2 levels showed a significant positive correlation with testicular mast cell infiltration levels. Moreover, we showed myoid cells and Leydig cells were two of the important sources of testicular CCL2 in spermatogenic dysfunction. Mechanistically, we drew a potential “myoid/Leydig cells-CCL2-ACKR1-endothelial cells-SELE-CD44-mast cells” network of somatic cell–cell communications in the testicular microenvironment, which might play roles in spermatogenic dysfunction. Conclusions The present study revealed CCL2-relevant changes in the testicular immune microenvironment in spermatogenic dysfunction, providing new evidence for the role of immunological factors in azoospermia. Supplementary Information The online version contains supplementary material available at 10.1186/s13578-023-01034-2.

and washed in order to decline the autofluorescence of the tissue. Afterwards, DAPI dye (Servicebio, #G1012) was used for nuclear staining. The slides were sealed with antifluorescence quenching mounting agent (Servicebio, #G1401), visualized under fluorescence microscope (NIKON, Eclipse C1) and digitalized by fluorescent slide scanner (Pannoramic MIDI,3DHISTECH).
Colony formation assay GC-1 and GC-2 Cells were seeded into 6-well plates (around 300 cells per well) at day1 and added with Ccl2 solution (Ccl2 group) or ddH2O (control group). Cells were next cultured for seven days to form clones. At day8, the medium was removed and cells were washed with PBS for three times, fixed with 4% paraformaldehyde for 20min and stained with 1% crystal violet (#G1062, Solarbio) for 20min. The plates were then washed by running water and photographed after drying. The experiments were repeated three times.

Microarray data collecting and processing
The microarray data used in this study were listed in Additional file 3: Figure S1E. For GSE4797, the raw TXT files from CodeLink microarray platform were downloaded from Gene Expression Omnibus (GEO) dataset and were loaded with codelink package (v1.60.0) [5]. GPL2891 annotation file was downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?view=data&acc=GPL2891&id=12727&db= GeoDb_blob06. As GSE4797 raw data were with legacy probe names, instead of current ones, and in some samples the Feature_id column was missing, we used a workaround (listed in https://support.bioconductor.org/p/68532/#68849) offered by Dr.Diego Diez, the author and maintainer of the codelink package, to deal with raw data. Briefly, the raw ".TXT" files were loaded with readCodelinkSet function, conducting RMA correction (parameters set: method = "normexp", normexp.method="rma"), and quantile normalization (parameters set: method="quantile"). Thereafter, the above-mentioned GPL file was used to match " LOGICAL_COL/LOGICAL_ROW " columns of the original data with the Feature_id and currently used probe names (listed in the GPL file). Then, the probe-matrix with currently used probe names as rownames and sample ids as colnames was annotated with gene symbol using h20kcodSYMBOL function of h20kcod.db package (v3.4.0). For genes with missing value in more than 20% samples, the rows were deleted. The rest missing value were supplemented by k-Nearest Neighbor algorithm [6] and average expression were obtained for duplicate genes. Afterwards, the expression gene matrix was renormalized using normalizeBetweenArrays function of the limma package (v3.48.1) before using. For GSE45885, the raw ".cel" files were loaded with the oligo package (v1.56.0). The probe-matrix was then corrected and normalized using rma function and hugene10sttranscriptcluster.db (v8.8.0) was used for converting probe names to gene symbols. Average expression was used for duplicate genes and the gene-matrix were handled with normalizeBetweenArrays before analyzing. For E-TABM-1214, the raw ".cel" data of all 47 samples were downloaded from Arrayexpress dataset [7] and were together loaded with the affy package (v1.70.0). After being processed with rma function, hgu133plus2.db (v3.13.0) were used for probe to symbol conversion. Average expression was obtained for duplicate genes and gene expression matrix was renormalized with normalizeBetweenArrays. Then the data of 38 adult samples were further extracted out of the whole 47 samples' matrix for analyses while the juvenile samples were excluded. The JS or mJS of all samples were obtained from the corresponding original articles or datasets.

WGCNA and key modules identification
The top 5000 genes selected by median absolute deviation (MAD) of discovery set were extracted to construct WGCNA [17] using the WGCNA package (1.70-3). The soft-threshold power was set as 7 (scale-free R2= 0.91, slope=-1.19). Then the adjacency matrix was converted to a topological overlap matrix (TOM) and the dissTOM (1-TOM) matrix was calculated. Genes clustering tree was drawn with hclust function based on TOM-based dissimilarity. Thereafter, dynamic hybrid cutting method were employed to create the clustering dendrogram with "minModuleSize" set as 30. Then the similar modules were clustered and merged with "MEDissThres" set as 0.25. Pearson correlations between module eigengenes and selected clinical traits were calculated. And modules with high correlation with both testicular mast cell infiltration level and spermatogenic function were considered as key modules.

Function enrichment analyses of genes in key modules
The genes of key modules were extracted and used for functional enrichment analysis using the clusterProfiler package [18,19]. Gene Ontology (GO) terms were set as a reference and pathways with adjusted p<0.05 were defined as significantly enriched pathways.

DEGs identification and function enrichment analyses of microarray data
DEGs between full spermatogenesis and spermatogenic dysfunction groups of the discovery set were identified by the limma package (v3.48.1) [20]. Genes with adjusted p value (using false discovery rate method) <0.05 and |log2 fold change|(logFC) >1 were defined as DEGs. A volcano plot was drawn to show all the DEGs. The expression characteristics of the top 100 DEGs (top 50 up-regulated and top 50 down-regulated according to |logFC|) and their location on chromosome were visualized using the OmicCircos package (v1.30.0). All DEGs were extracted to complete GO and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analyses using the above-mentioned methods.

Acquisition of immune genes list and intersection of key modules/DEGs/immune genes list
Immunological genes list was obtained from the immunology database and analysis portal (ImmPort) dataset [21] (https://s3.immport.org/release/genelists/GeneList.txt?download=true, Updated: July 2020). The duplicate gene symbols were removed and a total of 1793 deduplicated immune-related genes were acquired and was defined as the immune gene list of our study. The intersections of immune gene list, DEGs list, and gene list of key modules were obtained and Venn diagrams were drawn to show the intersected genes using the VennDiagram package (v1.6.20).

Identification of hub immune genes related to both testicular mast cell infiltration and spermatogenic function
Intersected gene list was input into Search Tool for the Retrieval of Interacting Genes (STRING)database [22] to construct a protein-protein interaction (PPI) network (parameters set: minimum required interaction score=0.7, hide disconnected nodes =True). The PPI network was downloaded and then further analyzed by Cytoscape (v 3.7.2). The maximum clique centrality (MCC) algorithm of CytoHubba plug-in [23] was used to find hub genes. The top 10 genes with highest MCC values were identified as hub immune genes, which were the key immune-related genes that were relevant to both testicular mast cell infiltration and spermatogenic function.

Internal and external validation and further filtration of hub immune genes
The discovery set itself was used to do hub genes' internal validation. Two validation sets were employed to externally validate and further filter these hub immune genes from two angles. First, the spearman correlations between the expression of hub immune genes and mast cell infiltration level were worked out using cor function and ggcorrplot package (v0.1.3), and the results were summarized by ggdotchart function of ggpub package (v0.4.0). Second, the spearman correlations between the expression of hub immune genes and JS (or mJS) were calculated (also summarized by ggdotchart) in order to validate the relationship between hub genes and spermatogenic function. Correlation heatmaps based on 10 hub genes were constructed by corrplot (v0.90). The interested hub gene was picked out for further study according to their performance in the validation and their functions reported by previous literatures. For the individual interested hub gene, its correlations with mast cell infiltration or JS in the discovery set and validation sets were illustrated by packages ggplot2 (v3.3.5). Next, the expression of interested hub gene was validated in the testing set. And spearman correlations between its expression and mast cell infiltration level, testicular volumes and JS were also validated in the testing set. Correlations between the expression of the interested hub gene and mast cell infiltration level/testicular volume in the testing set were visualized by ggscatterstats function of ggstatsplot (v0.9.3). Receiver operating characteristic (ROC) curves of the interested hub gene were plotted in GraphPad Prism (v9.2.0). Gene set variation analysis (GSVA) and gene set enrichment analysis (GSEA) of microarray data GSVA [24] was carried out using the GSVA package. The 16 reference gene sets related to mast cells (Additional file 1: Supplementary File 1) were obtained from GO collection and was downloaded from Molecular Signatures Database (MSigDB) (http://www.gseamsigdb.org/gsea/msigdb/index.jsp). Thereafter, the GSVA results were visualized using pheatmap (v1.0.12), and the correlations between GSVA scores of each sample and the expression level of interested hub gene were calculated. Moreover, GSEA [25] was conducted to further evaluate the molecular mechanisms of a certain interested hub gene involved in spermatogenic dysfunction. First, the samples were divided into two clusters according to the median expression value of an interested hub gene to be studied (high versus low expression clusters). Second, differential expression analysis was conducted between the two clusters and a logFC value was get for each gene. Third, all genes were ranked by logFC, and the ranked gene list was subjected to GSEA, which was conducted by clusterProfiler (18,19) (parameters set: exponent = 1, minGSSize = 10, maxGSSize = 500, by = "fgsea" ) based on "c2.cp.kegg.v7.4.entrez.gmt" set (downloaded from MSigDB).

Construction of interested hub gene-related immune PPI network in testis
We constructed a testicular immune PPI network related to the interested hub gene. First, the immune gene list was used to separately get its intersections with genes of the discovery set, validation set 1 and 2 (so we get three immune sub-lists). Next the intersection of those three immune sub-lists was obtained, which referred to the commonly expressed immune genes in the testis. Third, spearman correlations between the expression level of interested hub gene and commonly expressed immune genes were calculated in discovery and validation sets, and genes that maintained all rho>0.5 along with p<0.05 among all three datasets were considered as interested hub gene-related testicular immune genes. These genes (along with the interested hub gene) were input into STRING database to construct a hub gene-related testicular immune PPI network (parameters set: minimum required interaction score=0.7, hide disconnected nodes =True). And the network was downloaded and processed by Cytoscape (the immune proteins not connected to the main network were removed). Therefore, the interested hub gene-related testicular immune PPI network was established and immune proteins were categorized according to gene functions (the categories were simplified according to the "Category" column of the original immune gene list from ImmPort).

Statistical analysis
Data were shown as mean±SD. For group comparison, Mann-Whitney test (two groups) or Kruskal-Wallis test (three or more groups) and Dunn multiple-comparisons for post-hoc examination were used. The correlations between gene expression level/GSVA score and JS/immune cell infiltration level were based on spearman correlation analyses. For semiquantitative analysis of CCL2 in the testing set, five random microscopic high power fields (HPFs) (40×) of each IHC staining image were captured. The Integrated Optical Density (IOD) and Tissue Quantification Area (Area, calculated by pixel) of each field were determined with Image Pro Plus 6.0 (v6.0.0.260). Finally, we used Average Optical Density (AOD= IOD/Area) for statistical calculation. For mast cell infiltration intensity of the testing set, we also captured five random microscopic HPFs of each slide and positive cells were quantified manually in Image Pro Plus 6.0 with Tissue Quantification Area (Area) presented as mm 2 . The final infiltration level of mast cells was presented as cells/mm 2 . For colony formation assay, the pictures of each well were separately analyzed using ImageJ (v1.53k) plugin ColonyArea [26], and colony intensity percentage (CIP) was used for statistical analysis. All statistical tests were performed in R studio or GraphPad Prism (v9.2.0). Statistical significance was set at p value<0.05.